NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Augmenting large language models with chemistry tools

https://doi.org/10.1038/s42256-024-00832-8

M_Bran, Andres; Cox, Sam; Schilter, Oliver; Baldassari, Carlo; White, Andrew_D; Schwaller, Philippe (May 2024, Nature Machine Intelligence)

Abstract Large language models (LLMs) have shown strong performance in tasks across domains but struggle with chemistry-related problems. These models also lack access to external knowledge sources, limiting their usefulness in scientific applications. We introduce ChemCrow, an LLM chemistry agent designed to accomplish tasks across organic synthesis, drug discovery and materials design. By integrating 18 expert-designed tools and using GPT-4 as the LLM, ChemCrow augments the LLM performance in chemistry, and new capabilities emerge. Our agent autonomously planned and executed the syntheses of an insect repellent and three organocatalysts and guided the discovery of a novel chromophore. Our evaluation, including both LLM and expert assessments, demonstrates ChemCrow’s effectiveness in automating a diverse set of chemical tasks. Our work not only aids expert chemists and lowers barriers for non-experts but also fosters scientific advancement by bridging the gap between experimental and computational chemistry.
more » « less
Assessment of chemistry knowledge in large language models that generate code

https://doi.org/10.1039/D2DD00087C

White, Andrew D.; Hocky, Glen M.; Gandhi, Heta A.; Ansari, Mehrad; Cox, Sam; Wellawatte, Geemi P.; Sasmal, Subarna; Yang, Ziyue; Liu, Kangxin; Singh, Yuvraj; et al (April 2023, Digital Discovery)

In this work, we investigate the question: do code-generating large language models know chemistry? Our results indicate, mostly yes. To evaluate this, we introduce an expandable framework for evaluating chemistry knowledge in these models, through prompting models to solve chemistry problems posed as coding tasks. To do so, we produce a benchmark set of problems, and evaluate these models based on correctness of code by automated testing and evaluation by experts. We find that recent LLMs are able to write correct code across a variety of topics in chemistry and their accuracy can be increased by 30 percentage points via prompt engineering strategies, like putting copyright notices at the top of files. Our dataset and evaluation tools are open source which can be contributed to or built upon by future researchers, and will serve as a community resource for evaluating the performance of new models as they emerge. We also describe some good practices for employing LLMs in chemistry. The general success of these models demonstrates that their impact on chemistry teaching and research is poised to be enormous.
more » « less
Full Text Available
Symmetric Molecular Dynamics

https://doi.org/10.1021/acs.jctc.2c00401

Cox, Sam; White, Andrew D. (June 2022, Journal of Chemical Theory and Computation)
14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon

https://doi.org/10.1039/d3dd00113j

Jablonka, Kevin Maik; Ai, Qianxiang; Al-Feghali, Alexander; Badhwar, Shruti; Bocarsly, Joshua D.; Bran, Andres M.; Bringuier, Stefan; Brinson, L. Catherine; Choudhary, Kamal; Circi, Defne; et al (August 2023, Digital Discovery)

Large-language models (LLMs) such as GPT-4 caught the interest of many scientists. Recent studies suggested that these models could be useful in chemistry and materials science. To explore these possibilities, we organized a hackathon. This article chronicles the projects built as part of this hackathon. Participants employed LLMs for various applications, including predicting properties of molecules and materials, designing novel interfaces for tools, extracting knowledge from unstructured data, and developing new educational applications. The diverse topics and the fact that working prototypes could be generated in less than two days highlight that LLMs will profoundly impact the future of our fields. The rich collection of ideas and projects also indicates that the applications of LLMs are not limited to materials science and chemistry but offer potential benefits to a wide range of scientific disciplines.
more » « less
Full Text Available

Search for: All records